<!DOCTYPE html>
<!--
To change this license header, choose License Headers in Project Properties.
To change this template file, choose Tools | Templates
and open the template in the editor.
-->
<html>
    <body>
        The package provides fundamental data structures and methods of a Bayesian network.
        It has classes and methods for creating and using Bayesian networks. Inference and learning algorithms are found in {@link bn.alg}.
        
        <h2>Creating a Bayesian network</h2>

        <h3>Define domains and variables</h3>

A variable has values that are defined by a domain. There are currently two domain classes: {@link bn.Enumerable} and {@link bn.Continuous}. Unless you need new data types, you can avoid to refer to them directly. 
The framework supports the development of arbitrary variable types, but several types are pre-defined for easy application. They are available both from your Java code, and via naming.
<br><br>
You get most control by defining a variable "X" by domain, like so:<br>
<pre>
    Enumerable myDomain = new Enumerable(new String[] {"red", "green", "blue"});
    EnumVariable varX = new EnumVariable(myDomain, "X");
</pre>
Most variants can be achieved using pre-defined variable types, like so:<br>
<pre>
    EnumVariable varX = Predef.Nominal(new String[] {"red", "green", "blue"}, "X");
</pre>
You will find a bunch of these factory methods in {@link bn.Predef}. When you want to save and load Bayesian networks to XML files, you must either (a) use only predefined types, or (b) amend the class {@link bn.Predef} with all the details.
<br>
Continuous variables are less constrained since they can be assigned an infinite number of values. They do not have a variable class specific to them. Currently, the easiest way is to use {@link bn.Predef}:<br>
<pre>
    Variable varY = Predef.Real("Y");
</pre>

        <h3>Define nodes</h3>

Once all variables are properly defined, it is time to connect them in the guise of nodes. There is currently two types of nodes, one for <code>EnumVariable</code> and one for <code>Variable&lt;Continuous&gt;</code>.
They are known as {@link bn.CPT} (conditional probability table) and {@link bn.GDT} (Gaussian density table), respectively. They both implement the {@link bn.BNode} interface. (If you want to develop new node types, they need to implement this interface too.)
Currently, any {@link bn.BNode} will have to be based on enumerable parents. (Continuous "parents" make little sense, probabilistically, anyway.)
<br>
Here's an example based on one used in the literature (Russell and Norvig (2003; p. 493-494)):
<pre>
    EnumVariable B = Predef.Boolean("Burglary");
    EnumVariable E = Predef.Boolean("Earthquake");
    Variable S = Predef.Real("Seismic signal");
    EnumVariable A = Predef.Boolean("Alarm");
    EnumVariable J = Predef.Boolean("John calls");
    EnumVariable M = Predef.Boolean("Mary calls");

    CPT b = new CPT(B);
    CPT e = new CPT(E);
    GDT s = new GDT(S,    E);
    CPT a = new CPT(A,    B, E);
    CPT j = new CPT(J,    A);
    CPT m = new CPT(M,    A);

</pre>
The code defines six variables and then defines six nodes, connecting variables B and E as parents to A, E to S, abd variable A as a parent to both J and M. B and E have no parents.
<br>
The parameters of the nodes can be manually assigned, like so:
<pre>
    b.put(new EnumDistrib(Enumerable.bool, 0.001, 0.999));
    e.put(new EnumDistrib(Enumerable.bool, 0.002, 0.998));
    s.put(new GaussianDistrib(6.0, 3.0), true);
    s.put(new GaussianDistrib(2.0, 3.0), false);
    a.put(new EnumDistrib(Enumerable.bool, 0.95, 0.05), true, true);
    a.put(new EnumDistrib(Enumerable.bool, 0.94, 0.06), true, false);
    a.put(new EnumDistrib(Enumerable.bool, 0.29, 0.71), false, true);
    a.put(new EnumDistrib(Enumerable.bool, 0.001, 0.999), false, false);
    j.put(new EnumDistrib(Enumerable.bool, 0.90, 0.10), true);
    j.put(new EnumDistrib(Enumerable.bool, 0.05, 0.95), false);
    m.put(new EnumDistrib(Enumerable.bool, 0.70, 0.30), true);
    m.put(new EnumDistrib(Enumerable.bool, 0.01, 0.99), false);
</pre>
Each assignment is for a single row in a CPT. Two things to think about. What the entry is, and what the "parent" key is. The entry is a distribution of sorts. 
For enumerable variables, it will be a {@link bn.EnumDistrib}, which will want to know what specific domain the distribution is valid for. 
The domain defines the order in which you then provide the probabilities. (In this case we refer to Enumerable.bool, which declares an {@link bn.Enumerable} that lists the values "true", then "false".)
The second part is a list of values assigned to the parents, following the same order that was used when the CPT (or GDT) was defined.

        <h3>Define network</h3>

The final step before you can use the Bayesian network is to package your nodes into the BNet class. This will make sure you can treat the network as a self-contained "unit", ready for saving, inference and training etc. 
You can add them one-by-one, or together.
<pre>
    BNet bn = new BNet();
    bn.add(b, e, s, a, j, m);
</pre>
If want to, you can specify the Bayesian network in an XML file. This will impose some limitations as all variable and node types need to be pre-defined as discussed earlier. 
Indeed, these limitations apply if you want to save your network too. Here's how you could save the definitions above to a file "bn_example.xml".
<pre>
    file.BNBuf.save(bn, "bn_example.xml");
</pre>
If you are interested in how you can write your own XML files, please look in this file. The format is relatively straightforward.

        <h3>Make inference</h3>
        
Inference involves specifying available evidence (observations), and then the variable(s) for which probabilities are sought.
We don't assign observations to variables, but to nodes in the Bayesian network, like so:
<pre>
    j.setInstance(true);
    m.setInstance(true);
</pre>
Of course the values we assign to the nodes must be valid values according to the domains of the variables in the nodes.
Then finally, we use an inference engine (that should implement the interface {@link bn.alg.Inference}) such as {@link bn.alg.VarElim}.
To enable optimisations to occur inside the inference engine, we provide query variables to it, and the engine come up with a useful representation.
(In some applications, the same query may be posed many times, but with different evidence, or vice versa.)
<pre>
    alg.CGVarElim ve = new alg.VarElim();
    ve.instantiate(bn);
    alg.Query q = ve.makeQuery(B);
</pre>
The final step is to let the inference run. It returns a results table with the query variable(s). 
You can extract specific probabilities from the table, using methods defined for this. 
Though some of details here are under development, {@link bn.alg.CGVarElim} returns an instance of {@link bn.alg.CGTable}, which 
provides methods for extracting probability distributions for single variables, both enumerable and continuous.
<pre>
    CGTable qr = (CGTable) ve.infer(q);
    qr.display();
</pre>
To get a broader and more detailed idea of how this code works, a number of examples can be found in the package {@link bn.example}.

    </body>
</html>
